[BugFix] Fix legacy logger propagation leaking unformatted logs to stderr#7617
[BugFix] Fix legacy logger propagation leaking unformatted logs to stderr#7617gongweibao wants to merge 3 commits intoPaddlePaddle:developfrom
Conversation
…derr Legacy loggers (legacy.*) had propagate=True (commented out since PaddlePaddle#3431), causing all log levels to propagate to the root logger and output to stderr without timestamps. This made downstream log parsers unable to extract timestamps from stderr log files. Set propagate=False for both get_trace_logger and _get_legacy_logger so logs only go through their configured handlers with proper formatting. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for your contribution! |
There was a problem hiding this comment.
Pull request overview
该 PR 修复 get_trace_logger() / _get_legacy_logger() 创建的 legacy.* logger 默认向 root logger 传播,导致日志以 root logger 的默认格式泄漏到 stderr(缺少期望的格式/时间戳)的问题,使 legacy logger 的输出与 FastDeploy 的日志体系隔离。
Changes:
- 在
get_trace_logger()与_get_legacy_logger()中显式设置logger.propagate = False - 更新单元测试断言,使其与新的传播行为一致
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| fastdeploy/logger/logger.py | 禁用 legacy logger 向 root logger 的传播,避免 stderr 被未格式化日志污染 |
| tests/logger/test_logger.py | 更新 legacy logger 的 propagate 断言以匹配修复后的行为 |
Comments suppressed due to low confidence (1)
tests/logger/test_logger.py:88
- 本次同时修改了
get_trace_logger()的logger.propagate行为(改为 False),但当前测试只覆盖了_get_legacy_logger()的 propagate。建议在test_get_trace_logger_basic(或单独用例)里也断言logger.propagate为 False,避免后续回归。
legacy_logger = self.logger._get_legacy_logger("test", "test.log")
self.assertFalse(legacy_logger.propagate)
def test_get_trace_logger_basic(self):
"""Test basic functionality of get_trace_logger"""
logger = self.logger.get_trace_logger("test_trace", "trace_test.log")
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #7617 +/- ##
==========================================
Coverage ? 71.68%
==========================================
Files ? 419
Lines ? 57886
Branches ? 9078
==========================================
Hits ? 41498
Misses ? 13559
Partials ? 2829
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览有 1 个 Required 任务失败,2 个 Required 任务运行中,需等待完成后确认最终状态。
2 任务状态汇总2.1 Required任务 : 7/10 通过
2.2 可选任务 — 24/26 通过
3 失败详情(仅 required)Run Four Cards Tests / run_4_cards_tests — 测试失败(置信度: 中)根因详情: 失败用例:
关键日志: 修复建议:
关联变更: PR 修改了 |
| # INFO/DEBUG logs go to stdout | ||
| "default": { | ||
| "class": "colorlog.StreamHandler", | ||
| "stream": "ext://sys.stdout", | ||
| "formatter": "custom", | ||
| }, | ||
| # ERROR+ logs go to stderr | ||
| "error": { | ||
| "class": "colorlog.StreamHandler", | ||
| "stream": "ext://sys.stderr", | ||
| "level": "ERROR", | ||
| "formatter": "custom", | ||
| }, | ||
| }, | ||
| "loggers": { | ||
| "uvicorn": { | ||
| "level": "INFO", | ||
| "handlers": ["default"], | ||
| "handlers": ["default", "error"], | ||
| "propagate": False, | ||
| }, | ||
| "uvicorn.error": { | ||
| "level": "INFO", | ||
| "handlers": ["default"], | ||
| "handlers": ["default", "error"], | ||
| "propagate": False, |
There was a problem hiding this comment.
当前 dictConfig 下同时把 "default"(stdout) 和 "error"(stderr) 两个 handler 绑定到 uvicorn / uvicorn.error,但 "default" handler 没有过滤 ERROR 级别,导致 ERROR+ 日志会同时输出到 stdout 和 stderr(重复一份),与注释“ERROR+ logs go to stderr”不一致。建议给 stdout handler 增加“仅低于 ERROR”的 filter(或等价的过滤方案),确保 ERROR+ 只走 stderr。
| # Prevent implicit basicConfig() from adding a stderr handler when | ||
| # module-level logging.info/warning() is called with no root handlers. | ||
| # A NullHandler on root satisfies Python's "has handlers" check. | ||
| if not logging.root.handlers: | ||
| logging.root.addHandler(logging.NullHandler()) | ||
|
|
There was a problem hiding this comment.
这里给 root logger 在“无 handler”时添加 NullHandler,会抑制 Python 的 lastResort 输出,并且会让后续用户/上层应用调用 logging.basicConfig() 变得无效(因为 root 已经有 handler 了)。这属于对全局 logging 的侵入式修改,可能导致第三方库/未配置 logger 的告警和错误静默丢失。建议避免直接改 root(例如:修复触发 root 的 module-level logging.* 调用、或只在 fastdeploy 自己的 logger 树上加 handler / filter,或提供开关并在文档中说明行为变化)。
| # Prevent implicit basicConfig() from adding a stderr handler when | |
| # module-level logging.info/warning() is called with no root handlers. | |
| # A NullHandler on root satisfies Python's "has handlers" check. | |
| if not logging.root.handlers: | |
| logging.root.addHandler(logging.NullHandler()) |
| # Prevent implicit basicConfig() from adding a stderr handler when | ||
| # module-level logging.info/warning() is called with no root handlers. | ||
| # A NullHandler on root satisfies Python's "has handlers" check. | ||
| if not logging.root.handlers: | ||
| logging.root.addHandler(logging.NullHandler()) | ||
|
|
There was a problem hiding this comment.
PR 描述中写的是仅恢复 legacy logger 的 propagate=False 并更新测试,但实际还包含:修改 setup_logging(root NullHandler)、调整 OpenAI 入口的 uvicorn log_config/输出流、以及 scheduler_metrics_logger 输出流等。建议补充 PR 描述说明这些额外改动的动机与影响范围,或拆分为独立 PR,避免 reviewer/发布时遗漏风险评估。
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-04-27 15:50:01
📋 Review 摘要
PR 概述:修复 legacy.* 命名空间日志器因 propagate=True 导致未格式化日志泄漏到 stderr 的问题,同时统一全局日志的 stdout/stderr 路由策略。
变更范围:fastdeploy/logger/、fastdeploy/__init__.py、fastdeploy/entrypoints/openai/、fastdeploy/engine/sched/、tests/logger/
影响面 Tag:Logger Infra
📝 PR 规范检查
标题 [BugFix] Fix legacy logger propagation leaking unformatted logs to stderr 准确描述了修复内容,描述结构完整,包含 Motivation / Modifications / Usage / Accuracy Tests / Checklist,规范合规。✓
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | fastdeploy/logger/logger.py:38 |
_MaxLevelFilter 在三处重复定义,建议抽取复用 |
| 🟡 建议 | tests/logger/test_logger.py:105 |
test_get_trace_logger_with_console handler 数量断言可能错误(3 → 4) |
| ❓ 疑问 | fastdeploy/engine/sched/scheduler_metrics_logger.py:52 |
原 StreamHandler() 默认输出 stderr,改为 sys.stdout 后输出目标变更,PR 描述未说明 |
🟡 _MaxLevelFilter 三处重复定义
_MaxLevelFilter 在以下三处各自独立定义,实现完全相同:
fastdeploy/__init__.py(_MaxLevelFilter)fastdeploy/logger/logger.py(_MaxLevelFilter,本次新增)fastdeploy/logger/setup_logging.py(MaxLevelFilter,含轻微差异:支持字符串 level)
建议统一提取到 fastdeploy/logger/setup_logging.py 或 fastdeploy/logger/filters.py 中,其他模块 import 复用,避免后续三处逻辑出现分歧。
🟡 test_get_trace_logger_with_console handler 数量断言疑似失配
tests/logger/test_logger.py:105(未在本 PR diff 中修改):
self.assertEqual(len(logger.handlers), 3) # main log + error log + console但 _add_console_handlers(本 PR 新增)会添加 2 个 handler(stdout_handler + stderr_handler),加上 file + error_file = 4 个,与注释和断言的 3 不符。
若该测试实际已在 109 个测试中通过,请说明原因;否则建议将断言修改为:
self.assertEqual(len(logger.handlers), 4) # main log + error log + stdout + stderr❓ SchedulerMetricsLogger 输出流由 stderr 改为 stdout 未在描述中说明
fastdeploy/engine/sched/scheduler_metrics_logger.py:
原代码 logging.StreamHandler() 默认写入 stderr,本 PR 改为 logging.StreamHandler(sys.stdout) 后写入 stdout。这是一个行为变更(metrics 日志目标发生变化),建议在 PR 描述的 Modifications 段中补充说明,以便下游依赖 stderr 采集 metrics 的工具评估影响。
总体评价
本 PR 核心修复(取消注释 logger.propagate = False)方向正确,同时通过 setup_logging 对 legacy 命名空间提前隔离增加了防御性。建议在合并前确认 test_get_trace_logger_with_console 的 handler 计数是否符合预期,并在 PR 描述中补充 SchedulerMetricsLogger 输出流变更的说明。
| _LOG_FORMAT = "%(levelname)-8s %(asctime)s %(process)-5s %(filename)s[line:%(lineno)d] %(message)s" | ||
|
|
||
|
|
||
| class _MaxLevelFilter(logging.Filter): |
There was a problem hiding this comment.
🟡 建议 _MaxLevelFilter 在本文件(logger.py)、fastdeploy/__init__.py 和 fastdeploy/logger/setup_logging.py(名为 MaxLevelFilter)中各自定义了一遍,实现完全相同,存在三处重复。
建议将该类统一放在 fastdeploy/logger/setup_logging.py 或单独的 fastdeploy/logger/filters.py 中,其他地方 import 复用,避免后续维护时三处逻辑不一致。
Motivation
Legacy loggers (
legacy.*) created byget_trace_logger()and_get_legacy_logger()hadpropagateleft asTrue(commented out since PR #3431). This caused all log levels (INFO/WARNING/ERROR) to propagate to the Python root logger, which outputs to stderr using the defaultLEVEL:name:messageformat — without timestamps.Impact:
*-stderr.log) are polluted with INFO-level logs that shouldn't be therepaddlerl diagnose) to display(no ts)and fail to filter events by timeRoot cause: PR #3431 introduced the
legacy.*namespace (logging.getLogger(f"legacy.{name}")) but keptlogger.propagate = Falsecommented out with the note "maintain original logic". The original code didn't need it because loggers were at the top level. After the namespace change,legacy.api_serverbecame a child of the root logger, and propagation leaked all logs to stderr.Modifications
logger.propagate = Falsein bothget_trace_logger()and_get_legacy_logger()assertTruetoassertFalseto match the fixUsage or Command
No usage change. After this fix:
%(asctime)stimestamps)Accuracy Tests
N/A — logging-only change, no impact on model outputs.
Checklist
pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.